Instagram Data

Estrella Hurtado
Alina Valliani

April 25, 2024

Introduction

Our data is derived from Instagram accounts and comes from the website known as Kaggle.com.

Our data contains usernames, followings, followers, likes, comments, and locations of different accounts.

We added some columns in our data such as engagement, engagement_quantile, post_timestamp, and caption_length.

This data is interesting because it has a large sample of different accounts where we can draw conclusions about patterns in engagement scores. We also compare and contrast some things from our data.

Problem Statement and Questions

We choose this data to understand Instagram engagement trends and the factors which contributes to post and videos.

What Libraries did we use?

Lets take a look at Unfiltered Data

insta_data <- read_csv("instagram_data.csv")
glimpse(insta_data)
## Rows: 11,692
## Columns: 14
## $ owner_id        <chr> "36063641", "36063641", "36063641", "36063641", "36063…
## $ owner_username  <chr> "christendominique", "christendominique", "christendom…
## $ shortcode       <chr> "C3_GS1ASeWI", "C38ivgNS3IX", "C35-Dd9SO1b", "C33TadDM…
## $ is_video        <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ caption         <chr> "I’m a brunch & Iced Coffee girlie☕️🍳 \n\nTop @ta3 X …
## $ comments        <dbl> 268, 138, 1089, 271, 145, 143, 356, 132, 128, 884, 211…
## $ likes           <dbl> 16382, 9267, 10100, 6943, 17158, 9683, 42906, 4287, 74…
## $ created_at      <dbl> 1709326758, 1709241048, 1709154707, 1709065322, 170871…
## $ location        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ imageUrl        <chr> "https://instagram.flba2-1.fna.fbcdn.net/v/t39.30808-6…
## $ multiple_images <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
## $ username        <chr> "christendominique", "christendominique", "christendom…
## $ followers       <dbl> 2144626, 2144626, 2144626, 2144626, 2144626, 2144626, …
## $ following       <dbl> 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, …

Mutated Columns

Key terms

Engagement - refers to the actual score from the data.

engagement_quantile - refers to the follower count divided into four different quarters.

post_timestamp - refers to the time when pictures or videos was posted.

caption_length - refers to the length of the caption.

(Added new columns which represent 1 as the lowest followers, 2 and 3 as the average followers and the 4 as the highest followers).

new_data<- insta_data %>% mutate(engagement = round((((likes+comments)/followers)*100),digits = 2),
                                 follower_quantile = ntile(followers,4),
                                 engagement_quantile = ntile(engagement,4),
                                 post_timestamp = as_datetime(created_at),
                                 post_time = format(round(post_timestamp,units = "hours"),format = "%H:%M"),
                                 caption_length = lengths(strsplit(caption, ' ')))

Filtered Data

Our original data was messed up so we added new columns with calculated values.

## Rows: 11,692
## Columns: 20
## $ owner_id            <chr> "36063641", "36063641", "36063641", "36063641", "3…
## $ owner_username      <chr> "christendominique", "christendominique", "christe…
## $ shortcode           <chr> "C3_GS1ASeWI", "C38ivgNS3IX", "C35-Dd9SO1b", "C33T…
## $ is_video            <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
## $ caption             <chr> "I’m a brunch & Iced Coffee girlie☕️🍳 \n\nTop @ta…
## $ comments            <dbl> 268, 138, 1089, 271, 145, 143, 356, 132, 128, 884,…
## $ likes               <dbl> 16382, 9267, 10100, 6943, 17158, 9683, 42906, 4287…
## $ created_at          <dbl> 1709326758, 1709241048, 1709154707, 1709065322, 17…
## $ location            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ imageUrl            <chr> "https://instagram.flba2-1.fna.fbcdn.net/v/t39.308…
## $ multiple_images     <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ username            <chr> "christendominique", "christendominique", "christe…
## $ followers           <dbl> 2144626, 2144626, 2144626, 2144626, 2144626, 21446…
## $ following           <dbl> 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 10…
## $ engagement          <dbl> 0.78, 0.44, 0.52, 0.34, 0.81, 0.46, 2.02, 0.21, 0.…
## $ follower_quantile   <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1,…
## $ engagement_quantile <int> 3, 2, 3, 2, 3, 2, 4, 2, 2, 4, 2, 2, 1, 3, 4, 2, 3,…
## $ post_timestamp      <dttm> 2024-03-01 20:59:18, 2024-02-29 21:10:48, 2024-02…
## $ post_time           <chr> "21:00", "21:00", "21:00", "20:00", "20:00", "20:0…
## $ caption_length      <int> 12, 34, 81, 57, 17, 66, 50, 17, 8, 53, 17, 20, 90,…

Key terms

is_video - refers to the videos posted on Instagram account.

caption - refers to the titles on the Instagram posts.

comments/likes - refers to the followers response to the posts.

created_at - refers to the coded time stamp of when the post was created.

multiple_images - refers to the boolean of whether the post was a carousel or multiple image upload.

followers/following - refers to the users.

Reference of Account Followers Distribution

Insights on the average follower distribution meaning 1 is the lowest, 4 is the highest.

## # A tibble: 5 × 2
##   follower_quantile follower_mean
##               <int> <chr>        
## 1                 1 108,262      
## 2                 2 342,149      
## 3                 3 834,535      
## 4                 4 8,559,178    
## 5                NA NA

When do posts get the most engagement?

This is showing average engagement percent by post local time.

## # A tibble: 24 × 3
##    post_time `mean(engagement)` `n()`
##    <chr>                  <dbl> <int>
##  1 00:00                   2.08   261
##  2 01:00                   1.88   267
##  3 02:00                   1.71   238
##  4 03:00                   2.99   178
##  5 04:00                   2.37   138
##  6 05:00                   3.34   125
##  7 06:00                   2.38   145
##  8 07:00                   1.59   185
##  9 08:00                   3.81   223
## 10 09:00                   1.93   293
## # ℹ 14 more rows

Reference of the Post Engagement

We see the most engagement between the hours of 5am, 8am, 12pm, 1pm, 4pm and 5pm, during peak times of the day.

What is the relationship between caption length and engagement?

Highest engagement posts include captions with lengths x & y.

The graph shows that the short captions gains more engagement.

From pictures and videos which one get the most comment and likes?

We see pictures get more comments and likes than videos.

Summary

In summary, our presentation emphasized that our Instragram accounts data have following: